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APPLICATION OF SMIRNOV WORDS TO WAITING 
TIME DISTRIBUTIONS OF RUNS 

UTA FREIBERG, CLEMENS HEUBERGER, AND HELMUT PRODINGER 


Abstract. Consider infinite random words over a finite alpha¬ 
bet where the letters occur as an i.i.d. sequence according to some 
arbitrary distribution on the alphabet. The expectation and the 
variance of the waiting time for the first completed h-run of any 
letter (i.e., first occurrence of h subsequential equal letters) is com¬ 
puted. 

The expected waiting time for the completion of /i-runs of j 
arbitrary distinct letters is also given. 


1. Introduction 

In [7], the following paradox is presented: In measuring the regularity 
of a die one may use waiting times for sequences of the same side of 
certain lengths. For example, if ones throws a regular six-sided die, it 
takes 7 throws on average to get a number subsequently twice and 43 
throws to get a number three times in succession. Heuristically, one 
would expect that a smaller number of throws is needed to get such 
sequences with a biased die. This leads to the definition to call one 
die more regular than another one if more throws are needed to get 
sequences of one side of a certain length. Now the paradox is that 
there exist dice—say A and B —where the mean waiting time for two 
digits in a row is longer for die A while the mean waiting time for three 
digits in a row is longer for die B (an example has been given by Mori, 
see [71 p. 62]). The consequence of this paradox is that one cannot use 
the mean waiting times for such runs as a (sufficient) criterion for the 
definition of regularity of a die (or whatever random sequence of digits 
from a finite alphabet). 

This paradox gave motivation to calculate first and second moments 
of such waiting times for so called h-runs. In particular, the formula 
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for the first moment of the waiting time for the first completed h- 
run of any digit—which was already given in [7]— is proved without 
using the strong law of large numbers or any other limit theorem (see 
Theorem [^. Moreover, the variance of the waiting time for the first 
completed h-run is presented in the same theorem. We then compute 
the waiting time for the completion of h-runs of j different letters in 
Theorem]^ In particular, for j = r (the number of possible letters), 
we get results about the waiting time for a full collection of runs. 

Our fundamental technique is the calculation of generating functions 
of such waiting times; our main trick is the combination of two very 
useful observations: Firstly, we make use of the very simple but cru¬ 
cial identity ([^ (see lU) which already has been a powerful tool in the 
treatment of the coupon collector problem and/or the birthday para¬ 
dox. Secondly, we use the generating function of Smirnov words (see 
m) to count words with a limited number of repetitions of single letters 
using an appropriate substitution. 

We conclude the paper in Section with an algorithmic approach 
for specific situations. 


2. Preliminaries 

We consider infinite words X 1 X 2 ... over the alphabet M = {1,..., r} 
where the random variables Xi are i.i.d. with P{Xj = k} = pk > 0 for 
some pi, ..., pr- 

We say that a letter £ ^ A has an h-run in Xi... Xn if there are 
h consecutive letters £ in the word Xi... or in other words, if the 
word £^ = ££... £ (with h repetitions) is a factor of the word Xi... X„. 

We consider the random variable Bj giving the first position n such 
that there exist j of the r letters having an h-run in Xi... X„. This is 
a random variable on the infinite product space consisting of all infinite 
words endowed with the product measure. 

On the other hand, we consider the random variable W counting the 
number of letters which had an h-run in Xi... X„. This is a random 
variable on the finite product space consisting of all words of length u, 
again with its product measure. 

By construction, we have 

(1) nYn>3} = nBj<n}, 

cf. P, Eqn. (6)]. As a consequence, we obtain (cf. P Eqn. (7)1) 

(2) E(B,) = > n] = 5^P{F„ < j} = = «}. 

n>0 n>0 q=0 n>0 

With the generating function 

(3) GjW = 5^P{K„<jK. 

n>0 
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this amounts to 

E(B,) = G,.(l). 

To compute the variance, we first note that 

E{B]) = n^^{Bj = n} = Y > n - 1} - F{Bj > n}) 

n>0 n>0 

= > n} — > n} 

n>0 n>0 

= ^(2u + l)P{i?j > 77,} = ^(2n + l)P{Fn < jj 

n>0 n>0 

= 2Gj(l) + Gj(l) 

where we used Q and the definition of Gj(z) given in ([^. We conclude 
that 

(4) Y{B,) = E{B]) - E{B,y = 2G'{1) + G,{1) - G,{lf. 

A Smirnov word is defined to be any word which has no consecutive 
equal letters. The ordinary generating function of Smirnov words over 
the alphabet A is 

( 5 ) S{Vi, . . . ,Vr) = - 

1 - 

where Vi counts the number of occurrences of the letter i, cf. Flajolet 
and Sedgewick [21 Example III.24], 

3. Moments of the first /i-run 

In this section, we study the first occurrence of any h-run. In the 
framework of Section this corresponds to the case j = 1 and the 
random variable Bi. 

We prove the following result on the expectation of Bi\ 

Theorem 1. If pi <1 for 1 <i <r, the expectation and the variance 
of the first occurrence of an h-run are 

(6) E(5i) = 


and 


V(5i) = 




^ Pi + ■ ■ ■+p. 


— h 


( Pi + Pi _ ~ 


Pi 


(1 - PiY 


Y- _^_ 

i^Pi^ + ---+p^. 



(7) 
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The result (|^ on the expectation also appears (without proof) in j7l 
p. 62], Each summand of the numerator of ([^ is indeed non-negative, 
because this is equivalent to 


Pi+p’^ I + Pi -h p; 


h-l 




2 h 

which is true by the inequality between the arithmetic and the geomet¬ 
ric mean, applied to both factors. 

Proof of Theorem In the case j = 1, (|^ reads 

(8) E(5i) = 5^P{F„ = 0}. 

n>0 

Thus we have to determine the probability that a word of length n 
does not have any h-run. Such words arise from a Smirnov word by 
replacing single letters by runs of length in {1,..., h — 1} of the same 
letter. 

In terms of generating function, this corresponds to replacing each 
Vi by 

, , . .h-i PP - (PP)^ 

1-PiZ 

Here, z marks the length of the word. We obtain 

'piZ — {piz)^ PrZ — {prZY 


G,{z) = Y,nyn = ^]z^ = s 
1 


n>0 


PlZ 


■PrZ 




PiZ-{piZp 

i-PiZ 

PiZ-{piz)^ 
1-piZ 


I _ ~ 


PiZ - {pizY 


By 


.TT 1 + *=i 

, we are only interested in z = 1: 

E(Hi) = 5^p{y; = o} = G'i(i) 


{pizY 


h 


1 _ Pi-Pi 

n>0 -L Z^i=l i_ph 

Replacing the summand 1 in the denominator by pi -|- ■ ■ ■ -f- yields 

E(B.) = ^-- = 


2 = 1 


Eh-- 


Pi - Pi 


■p: 


E 

2=1 


Pi - Pi^^ -Pi+ Pi 

1 -pf 


^ P^{l-Pi) 

y ^ _ rr^ 

2 = 1 


Pi 


Y- _^ 

^ Pi ^ + • ■ ■+p. 


— h 


For the variance, we compute Gi(l) as 

{Pi - hp^)(l - Pf) + {Pi - Pi)hPi 

2=1 




(1 - pY 
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2'^ Pi - hPi - Pi~^^ + hp'f^ + - hp: 


2h 




^2h 


2=1 

r 




{1-p^r 

Pii^-Pi) - hpf{^-Pi) 


2=1 


{i-p’iy 

r 

'"E 


Pi{l-Pi) 


1 -p? "i^d-rirJ' 

By 0 , we obtain 

V(B,) = 2G;(1)+Gi(l)-Gi(lt 


E(B,y-i-l + 2Y,^,-2hYi 


Pii^-Pi) 

j^l-ph (1 _p/*)2 

S^ Pii^-Pi Y 

/ V 1 _ 


W 1-P: 

E(B 0 =(^“P‘ + P-"‘ + ^P'+P'“P 


2=1 


1 - Pi 


2AE 


P'li^-Pi) 


E(B0ME^-2'>E#b# 


Together with ([^, we obtain Q. 


□ 


4. Expectation of the first occurrence of /i-runs of j 

LETTERS 

In this section, we consider the hrst position where j of the letters 1, 
..., r had an h-run. In the terminology of Section this corresponds 
to the random variable Bj. 

We prove the following theorem on the expectation of Bj. 
Theorem 2. For i E A, let 


/n^ Pi - Pi 

9 Oi := ^-, 7i := . 

1 - Pi 1 - Pi 

and let Ai and Tj he the substitution operators mapping the variable Vi 
to ai and y*, respectively. 

Then the expectation of the first occurrence of h-runs of exactly j 
letters is 


h 


Pi 
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where S{vi,... ,Vr) is defined in (|^. 


For j = r, i.e., the first occurrence of h-runs of all letters, (10) can 
be simplified: 


Corollary 3. The expectation of the first occurrence of all h-runs is 


(11) E(S,) = f n r* - - A) ] ^(Ui,..., U.), 

j=l / 

where F*, Ai and S{vi,... ,Vr) are defined in (|^ and (|^, respectively. 


In the case of equidistributed letters, i.e., pi = 1/r for all i, we get 
the following simple expression. 


Corollary 4. If pi = ■■■= Pr = 1/r, then the expectation of the first 
occurrence of all h-runs is 


E(i?r) 


r(r^ — 1) 
r — 1 


where denotes the rth harmonic number. 


Proof of Theorem As in Section Yn is the number of letters that 
have at least one run of length > h within Xi... 

Arbitrary words arise from Smirnov words by replacing single letters 
by runs of length at least 1 of the same letter. In terms of generating 
functions, this corresponds to substituting Vi by 


PiZi -h (Piz)'" ^ + Uiiipiz)’" + (Piz)^^^ H-) 

PiZ - {pizY + ufipizY PiZ + {Ui - l){pizY at ^ 
l-PiZ 1 - PiZ 

As previously, z counts the length of the word. The variable Ui counts 
the number of occurrences of (non-extensible) m-runs of the letter i 
with m> h. 

We now consider the probability generating function 

F{ui, ...,Ur;z) = S{(3i{ui,z), . . . , fir{Ur, z)). 

of all words. 

For M A, let be the event that exactly the letters in M have 

an h-run in Xi... By definition, we have 

( 12 ) {Yn = q} = y En,M 

MCA 

\M\=q 

for g G {0,..., r}. 

We now compute F{En,M) foi' some M = {A,..., ig} of cardinality q. 
We denote the letters not contained in M by Al \ M = {si,..., Sn-q}. 
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By construction of the generating function, we have 
(13) 

= [z"l [<) ■ ■ ■ ^ [<•■) • • • ..., 

For any power series H{u), we have 


Y,K]H{u) = H(l) - H(0). 

m>l 


We therefore define the operators Aj and Zi by AiH{ui) = H{1) — H{0) 
and ZiH{ui) = H{0). With these notations, (13) reads 

(14) P(K,m) = (nA<n Z^Fiui, . . .,Ur,z). 

\&M i^M ' 

Inserting this and (12) in ([^ yields 

(15) E(B,)=531j"] 5^ (nA.n2.)u«i.....f.;j). 

A/fr~ A / 


n>0 MCA HCM i^M 
\M\<j 


Summing over all n > 0 amounts to setting z = 1 as long as all 
summands are non-singular at z = 1. As |M| < j, at least one of 
the Ui is zero, w.l.o.g. Ui = 0. This implies that [z"']F{ui,... ,Ur]z) < 
[z'^]F{0, 1,..., 1;2:) < p" for a suitable 0 < p < 1 as the word 1^ is 
forbidden as a factor. Thus F{ui,... ,Ur] z) is regular at z = 1. 

We note that /9i(l, 1) = 7 * and /3i(0,1) = ai where y* and ai are 
defined in ([^. Therefore, for z = 1, the operator A* can be written as 
Vi — Ai. Similarly, Zi corresponds to Ai. 

We have 


E 11(^1 - t) n = Els’] !!(!'’’. + (1 - vA.)- 

MCA i&M i^M q=0 i=l 

\M\<j 


Combining this with (15) yields (10). 


□ 


Proof of Corollary\^ The polynomial 111=1 + {l — y)Ai) has degree 

r in the variable y. Thus extracting all coefficients but the coefficient 
of p'’ amounts to substituting p = 1 and subtracting the coefficient of 
pC i.e.. 


Eb"! n(!/ri + (1 - y]A,) = n r, - n(ri - Ai). 

q=0 i=l i=l 2=1 


Inserting this into (10) yields 0- 


□ 
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Proof of Corollary Setting pi = 1/r yields 

1 1 1 n\h 1 1 

-L — 


d* -| 1 -I) 

1 — - r — 1 

r 


Oli = 




r — 1 


li 


Oti 


M-1 


- 1 


1 + 7j r ’ 1 + cij — 1 

Inserting this in ( [II] ) and collecting terms with k occurrences of Ai 
yields 


(-1) 


k=l 


\A:+1 


1 - ^ 


rh_i 


r(r^ — 1) 

r — 1 


E 

k=l 


where we used the identity 

r 

//, = E 

cf. ig. 


^ r(r^ - 1) 
k j ^ k r — 1 


r\ (-1)'^+^ 




k=l 


k k 


□ 


Remark 5. Let run lengths hi, ..., be given and consider occurrences 
of hj-runs for the letter i. If Bj is the hrst position n such that there 
are exactly j letters which had “their” run in Xi... the results of 
Theorems [^ and as well as Corollary [^ remain valid when all are 
replaced by p^'. 


5. Algorithmic Aspects 

For hxed h, the occurrence of an h-run of the variable Aj can easily 
be detected by a transducer automaton reading the occurrence proba¬ 
bilities Pi and outputting 1 whenever the letter z completes an h run, 
see Figure [I] for the case r = 2, h = 3 and i = 2. 



Figure 1. Transducer detecting 3-runs of the letter 1. 
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The same can be done for the first occnrrence of any h-rnn, see 
Fignre|^for r = 2 and h = 3. 



P2 I 1 


Figure 2. Transducer detecting the first 3-rnn of any letter. 


The first occnrrence of j rnns of length h conld also be modelled by 
a transducer. 

Using the finite state machine package ^ of the SageMath Mathe¬ 
matics Software [B], such transducers can easily be constructed. 

Accompanying this article, in j3], an extension of SageMath to com¬ 
pute the expectation and the variance of the first occnrrence of a 1 in 
the output of a transducer is proposed for inclusion into SageMath. 

Using this extension, the expectation and the variance of Bi can be 
computed for fixed r and h as shown in Table 

The results coincide with those obtained in Theorem [TJ For more 
examples, see the documentation of moments_waiting_time. 

For j > 1, we did not compute Y{Bj) in general. For fixed r and h, 
it can be computed by this algorithmic approach. 

Obviously, the SageMath method can be used for computing first 
occurrences of everything which is recognisable by a transducer. On the 
other hand, explicit results for general r and h such as onr Theorems 
and 1^ cannot be obtained by that method. 
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from sage.combinat import finite_state_machine as FSM 

# Deactivate deprecated code 

FSM.FSMOldCodeTransducerCartesianProduct = False 
FSM.FSMOldProcessOutput = False 

# Construct the polynomial ring and set up q 
R.<p> = QQ [] 

q = 1 - p 

# Construct the Transducers detecting runs of single 

# letters. [p, p, p] is the block to detect, [p, q] 

# the alphabet 

p_runs = transducers.CountSubblockOccurrences( 

[p, p, p], [p, q]) 

q_runs = transducers.CountSubblockOccurrences( 

[q, q, q], [p, q]) 

# In order to detect runs of both letters, build the 

# cartesian product ... 

both_runs = p_runs.cartesian_product(q_runs) 

# ... and add up the output by concatenating with 

# the predefined "add" transducer on the alphabet 

# [0, 1] We use the Python convention that any 

# non-zero integer evaluates to True in boolean 

# context. 

first_run = transducers.add( [0, 1] )(both_runs) 

# Declare it as a Markov chain 

first_run.on_duplicate_transition = \ 

FSM.duplicate_transition_add_input 
print first_run.moments_waiting_time() 

Table 1. Computation of the moments for Bi with r = 

2 and h = 3 in SageMath. 


References 

[1] Philippe Flajolet, Daniele Gardy, and Loys Thimonier, Birthday paradox, 
coupon collectors, caching algorithms and self-organizing search, Discrete Appl. 
Math. 39 (1992), no. 3, 207-229. 

[2] Philippe Flajolet and Robert Sedgewick, Analytic combinatorics, Cambridge 
University Press, Cambridge, 2009. 

[3] Clemens Heuberger, FiniteStateMachine: Moments of waiting time, http;// 
trac.sagemath.org/ticket/18070, 2015. 

[4] Clemens Heuberger, Daniel Krenn, and Sara Kropf, Automata and transducers 
in the computer algebra system Sage, 2014, arXiv: 1404.7458 [cs.FL]. 



WAITING TIME DISTRIBUTIONS OF RUNS 


11 


[5] Peter J. Larcombe, Eric J. Fennessey, Wolfram A. Koepf, and David R. French, 
On Gould’s identity No. 1-45, Util. Math. 64 (2003), 19-24. 

[6] William A. Stein et ah. Sage Mathematics Software (Version 6.5), The Sage 
Development Team, 2015, http;//www. sagemath.org, 

[7] Gabor J. Szekely, Paradoxes in probability theory and mathematical statistics, 
Mathematics and its Applications (East European Series), vol. 15, D. Reidel 
Publishing Co., Dordrecht, 1986, Translated from the Hungarian by Marta 
Alpar and Eva Unger. 

Institut fur Stochastik UNO Anwendungen, Universitat Stuttgart, 
Pfaffenwaldring 57, D-70569 Stuttgart, Germany 

E-mail address: uta.freiberg@mathematik.uni-stuttgart.de 

Institut fur Mathematik, Alpen-Adria-Universitat Klagenfurt, Uni- 
VERSiTATSSTRASSE 65-67, 9020 Klagenfurt, Austria 
E-mail address: clemens.heuberger@aau.at 

Department of Mathematical Sciences, Stellenbosch University, 7602 
Stellenbosch, South Africa 

E-mail address: hproding@sun. ac . za 


